Skip to content

Conversation

puzpuzpuz
Copy link
Contributor

Also changes type of IP columns to IPv4.

@rschu1ze rschu1ze self-assigned this Oct 6, 2025
@rschu1ze
Copy link
Member

rschu1ze commented Oct 7, 2025

@puzpuzpuz I ran the script locally for c6a.metal and c6a.4xlarge. The former succeeded and I updated your measurements. The latter somehow got stuck:

15958016K ........ ........ ........ ........ 98% 63.2M 13s
15990784K ........ ........ ........ ........ 98% 67.1M 11s
16023552K ........ ........ ........ ........ 98% 57.1M 9s
16056320K ........ ........ ........ ........ 99% 53.6M 7s
16089088K ........ ........ ........ ........ 99% 47.5M 5s
16121856K ........ ........ ........ ........ 99% 61.7M 4s
16154624K ........ ........ ........ ........ 99%  101M 2s
16187392K ........ ........ ........ ....... 100% 49.0M=14m59s

2025-10-06 19:01:52 (17.6 MB/s) - ‘hits.csv.gz’ saved [16608960810/16608960810]

ID int,\n    UserID long,\n    CounterClass byte,\n    OS short,\n    UserAgent short,\n    URL varchar,\n    Referer varchar,\n    IsRefresh byte,\n    RefererCategoryID short,\n    RefererRegionID int,\n    URLCate
goryID short,\n    URLRegionID int,\n    ResolutionWidth short,\n    ResolutionHeight short,\n    ResolutionDepth short,\n    FlashMajor byte,\n    FlashMinor byte,\n    FlashMinor2 symbol,\n    NetMajor byte,\n    N
etMinor byte,\n    UserAgentMajor short,\n    UserAgentMinor symbol,\n    CookieEnable byte,\n    JavascriptEnable byte,\n    IsMobile byte,\n    MobilePhone short,\n    MobilePhoneModel symbol,\n    Params symbol,\n
    IPNetworkID int,\n    TraficSourceID int,\n    SearchEngineID short,\n    SearchPhrase varchar,\n    AdvEngineID short,\n    IsArtifical byte,\n    WindowClientWidth short,\n    WindowClientHeight short,\n    Cli
entTimeZone short,\n    ClientEventTime timestamp,\n    SilverlightVersion1 byte,\n    SilverlightVersion2 byte,\n    SilverlightVersion3 short,\n    SilverlightVersion4 byte,\n    PageCharset symbol,\n    CodeVersio
n short,\n    IsLink byte,\n    IsDownload byte,\n    IsNotBounce byte,\n    FUniqID long,\n    OriginalURL varchar,\n    HID int,\n    IsOldCounter byte,\n    IsEvent byte,\n    IsParameter byte,\n    DontCountHits
byte,\n    WithHash byte,\n    HitColor char,\n    LocalEventTime timestamp,\n    Age byte,\n    Sex byte,\n    Income byte,\n    Interests short,\n    Robotness short,\n    RemoteIP ipv4,\n    WindowName int,\n    O
penerName int,\n    HistoryLength short,\n    BrowserLanguage symbol,\n    BrowserCountry symbol,\n    SocialNetwork symbol,\n    SocialAction symbol,\n    HTTPError byte,\n    SendTiming int,\n    DNSTiming int,\n
  ConnectTiming int,\n    ResponseStartTiming int,\n    ResponseEndTiming int,\n    FetchTiming int,\n    SocialSourceNetworkID short,\n    SocialSourcePage varchar,\n    ParamPrice long,\n    ParamOrderID symbol,\n
   ParamCurrency symbol,\n    ParamCurrencyID short,\n    OpenstatServiceName symbol,\n    OpenstatCampaignID symbol,\n    OpenstatAdID varchar,\n    OpenstatSourceID symbol,\n    UTMSource symbol,\n    UTMMedium sym
bol,\n    UTMCampaign symbol,\n    UTMContent symbol,\n    UTMTerm symbol,\n    FromTag symbol,\n    HasGCLID byte,\n    RefererHash long,\n    URLHash long,\n    CLID int\n) TIMESTAMP(EventTime) PARTITION BY DAY;","
error":"table already exists","position":13}+-----------------------------------------------------------------------------------------------------------------+
|      Location:  |                                              hits  |        Pattern  | Locale  |      Errors  |
|   Partition by  |                                               DAY  |                 |         |              |
|      Timestamp  |                                         EventTime  |                 |         |              |
+-----------------------------------------------------------------------------------------------------------------+
|   Rows handled  |                                          99997497  |                 |         |              |
|  Rows imported  |                                          99997497  |                 |         |              |
+-----------------------------------------------------------------------------------------------------------------+
|              0  |                                           WatchID  |                     LONG  |           0  |
|              1  |                                        JavaEnable  |                     BYTE  |           0  |
|              2  |                                             Title  |                  VARCHAR  |           0  |
|              3  |                                         GoodEvent  |                     BYTE  |           0  |
|              4  |                                         EventTime  |                TIMESTAMP  |           0  |
[...]
|            100  |                                           FromTag  |                   SYMBOL  |           0  |
|            101  |                                          HasGCLID  |                     BYTE  |           0  |
|            102  |                                       RefererHash  |                     LONG  |           0  |
|            103  |                                           URLHash  |                     LONG  |           0  |
|            104  |                                              CLID  |                      INT  |           0  |
+-----------------------------------------------------------------------------------------------------------------+
waiting for rows to become readable...
.
.
.
.
.
.
.
.
.
.

... but the rows never became "readable" (I waited 20 hours). Anyways, since the other measurements (yours and mine) were quite close together, I would expect your measurements on c6a.4xlarge to be accurate as well. Merging - thanks!

@rschu1ze rschu1ze merged commit c7925bc into ClickHouse:main Oct 7, 2025
@puzpuzpuz
Copy link
Contributor Author

I ran the script locally for c6a.metal and c6a.4xlarge. The former succeeded and I updated your measurements.

TBH I don't see any noticeable difference between your result and what was in the branch, but that's fine. At least, this means that I made no mistake when benchmarking.

The latter somehow got stuck

This is puzzling. I've never seen the waiting for rows to become readable... loop to be stuck, but indeed it may take a while to finish (~1.5h total for the import on the smaller box). I'll try reproducing this.

@puzpuzpuz puzpuzpuz deleted the questdb-9.1 branch October 7, 2025 13:39
@puzpuzpuz
Copy link
Contributor Author

@rschu1ze thanks for the review!

@rschu1ze
Copy link
Member

rschu1ze commented Oct 7, 2025

TBH I don't see any noticeable difference between your result and what was in the branch, but that's fine.

That's a good thing :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants